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ABSTRACT 

The gene- or fragment-specific detection of newly 
recognized deoxyribonucleic acid (DNA) base 5- 
hydroxymethylcytosine (5hmC) will provide insights 
into its critical functions in development and dis- 
eases, and is also important for screening 5hmC- 
rich genes as an indicator of epigenetic states, 
pathogenic processes and pharmacological re- 
sponses. Current analytical technologies for gene- 
specific detection of 5hmC are heavily dependent on 
glucosylated 5hmC-resistant restriction endonucle- 
ase cleavage. Here, we find that boronic acid (BA) 
can inhibit the amplification activity of Taq DNA poly- 
merase for replicating glucosylated 5hmC bases in 
template DNA by interacting with their glucose moi- 
ety. On the basis of this finding, we propose for 
the first time a BA-mediated polymerase chain re- 
action (PCR) assay for rapid and sensitive detec- 
tion of gene- or fragment-specific 5hmC without 
restriction-assay-like sequence limitations. To opti- 
mize the BA-mediated PCR assay, we further tested 
BA derivatives and show that one BA derivative, 2-(2'- 
chlorobenzyloxy) phenylboronic acid, displays the 
highest inhibitory efficiency. Using the optimized as- 
say, we demonstrate the enrichment of 5hmC in an 
intron region of Pax5 gene (a member of the paired 
box family of transcription factors) in mouse embry- 
onic stem cells. Our work potentially opens a new 
way for the screening and identification of 5hmC-rich 
genes and for high throughput analysis of 5hmC in 
mammalian cells. 



INTRODUCTION 

5-hydroxymethylcytosine (5hmC) is a recently re-discovered 
deoxyribonucleic acid (DNA) base that is converted from 
the well-characterized epigenetic mark 5-methylcytosine 
(5mC) (1-3). The conversion of 5mC to 5hmC is cat- 
alyzed by Tet family dioxygenases in mammalian cells (2). 
5hmC functions critically in nuclear reprogramming, devel- 
opment and diseases (4-13). Moreover, the level of 5hmC 
has been observed to be altered in several diseases, includ- 
ing hematopoietic malignancies and a broad range of solid 
tumours (12,13). A lack of 5hmC was implicated as a useful 
biomarker for cancer diagnosis (13). Interestingly, both nu- 
tritional [e.g. vitamin C (14)] and environmental factors [e.g. 
redox-active quinones (15)] affect the activity of Tet fam- 
ily dioxygenases mediating 5mC oxidation and the cellular 
level of 5hmC. These observations may indicate that 5hmC- 
varying or -rich genes or sequences could be exploited as in- 
dicators or biomarkers of epigenetic states, pathogenic pro- 
cesses, pharmacological responses and environmental expo- 
sures. Therefore, the development of analytical technologies 
is critical for the detection of 5hmC distribution in the con- 
text of sequences or genes. 

It is very difficult to discriminate the rare 5hmC from 
the abundant 5mC due to their similar chemical structures. 
For example, 5hmC is indiscernible from 5mC in bisulfite 
sequencing (16) and in most of 5mC-sensitive restriction 
assays (17,18). Interestingly, 5hmC can be selectively gluco- 
sylated by T4 phage p-glucosyltransferase ((3-GT) to form 
(3-glucosyl-5-hydroxymethylcytosine (5ghmC) (19), which 
is resistant to the cleaving activities of some methylation- 
insensitive restriction enzymes, e.g. Mspl (https://www.neb. 
com/nebecomm/techj-eference/epigenetics/epimark.asp). 
Therefore, 5hmC (in its glucosylated state) but not 5mC 
can be retained at short and cleavable sites of restriction 
endonucleases for the specific genes or sequences (20-23). 
The fragments obtained by the digestion of glucosy- 
lated genomic DNA with combined restriction enzymes 
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(Mspl/Hpall) could be enriched by ligation-mediated 
polymerase chain reaction (PCR) for identification of 
5hmC loci (24). However, the applications of these assays 
are conditional: (i) the target DNA region must contain 
the specific cleavable sites (e.g. CCGG for Mspl) and (ii) 
the 5hmC must be located precisely at the cleavable sites. 
Therefore, the restriction endonuclease assays are only 
applicable to certain loci or sequences and cannot identify 
5hmC in a number of targeted but non-cleavable DNA 
regions, where it may be a critical factor affecting gene 
activity. 

A number of affinity trapping approaches have also been 
developed in the 5hmC analysis of genomic DNA using an- 
tibodies (25,26), JBPl (27) and biotinylation (28,29). The 
diol of the gluocosylated 5hmC could be used to develop ox- 
idation chemistry for 5hmC capture (29). Meanwhile, sev- 
eral techniques for base-resolution analysis of 5-hmC were 
developed, including Tet-assisted bisulfite sequencing (30) 
and oxidative bisulfite sequencing (31). 

In this work, we attempted to develop a novel and generic 
PCR approach to perform a fast and high throughput anal- 
ysis of 5hmC at gene-specific or fragment-specific levels. 
Among currently known DNA bases in mammalian cells 
(32,33), only 5hmC can be specifically modified by (3-GT 
to glucosylated 5hmC. However, the glucosylation of 5hmC 
in genomic DNA cannot block the replicating action of 
DNA polymerses (see the Results section). Thereby, we ex- 
plored the further modification of glucosylated 5hmC in ge- 
nomic DNA without introducing any DNA damages and 
other byproducts. We hypothesized that an appropriately 
structured boronic acid (BA) can selectively bond with the 
vicinal cw-diol group in glucosylated 5hmC and increase 
the size of 5ghmC, leading to specific stalling of DNA 
polymerases on 5ghmC in template DNA (Scheme 1). Al- 
though BA and its derivatives have been exploited to sepa- 
rate and capture cw-diol-containing biomolecules (34-38), 
it remains unknown whether this interaction can block the 
replication of DNA polymerase on 5ghmC (glucosylated 
5hmC). Therefore, we tested the possible BA derivatives 
blocking of DNA polymerase amplification activity linking 
to 5hmC loci. 

MATERIALS AND METHODS 

Materials and chemicals 

All of the unmodified oligos were synthesized and puri- 
fied [by high performance liquid chromatography (HPLC)] 
by Sangon Biological Engineering Technology and Services 
(Shanghai, China). T4 DNA ligase, T4 polynucleotide ki- 
nase (T4 PNK), calf intestinal alkaline phosphatase, de- 
oxyribonuclease I and adenosine triphosphate (ATP) were 
purchased from New England BioLabs (Ipswich, MA, 
USA). Snake venom phosphodiesterase, boronic acid (BA), 
phenylboronic acid (PBA), 3-chlorophenylboronic acid (3- 
CPBA), 2-(2'-chlorobenzyloxy) phenylboronic acid (2-CB- 
PBA) and 3-(Dansylamino) phenylboronic acid (3-D-PBA) 
were purchased from Sigma- Aldrich (St. Louis, MO, USA). 
SmdCTP and ShmdCTP were purchased from Zymo Re- 
search (Irvine, CA, USA), and deoxyguanosine triphos- 
phate (dOTP), thymine triphosphate (TTP), deoxycyti- 
dine (dCTP), GO Taq hot start polymerase were ordered 
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Scheme 1. Illustration of the BA-mediated inhibition on amplification ac- 
tivity of DNA polymerase. First, the 5hmC in dsDNA is exclusively glu- 
cosylated; then, one boronic acid molecule is selectively bonded with the 
vicinal diol in the conjugated glucose of 5hmC, inhibiting the replication 
of the 5hmC-containing DNA and leading to a reduced PCR yield (left). 
The glucosylated 5hmC-containing DNA can be normally amplified by 
PCR without boronic acid (right). The symbol ' Indicates the substi- 
tuted group for boronic acid. 



from Promega (Madison, WI, USA). Other biochemicals 
were purchased from Sigma- Aldrich (St. Louis, MO, USA) 
or Fisher Scientific (Pittsburgh, PA, USA). The 5hmC- 
ds83mers with a sequence of 5'-TTTCCTACCT TAA- 
GATCCTT CCAGTCTC CGCCGCG CAGTG TTAC- 
CCTTAG AGCTCATACC ATTCGCCAAT TTCTTCG- 
CAC GTT-3' (only one strand shown) were synthesized 
and purified by TaKaRa Biotechnology Co., Ltd. (Dahan, 
China). 

Synthesis, purification and characterization of ShmC/SmC- 
containing oligos 

To synthesize 5hmC-containing oligodeoxynucleotide 
5'.TTTTCGAATTCCTCCCTGTA (5hmC) GTTTT- 
3' (5hmC-01igo 1, 26mer), a 41mer template (T, 5'- 
TTACTCATCATTTTTAAAACGT ACAGGGAG- 
GAA TTCGAAAA-3') and a 20mer primer (P, 5'- 
TTTTCGAATTCCTCCCTGTA-3') were used (Supple- 
mentary Figure SI). With the assistance of the template 
41mer, the 20mer primer (25 |jlM) was extended by six 
nucleotides using Taq DNA polymerase and limited dNTP 
(deoxynucleotide triphosphate) substrates. The extension 
reaction solution was prepared by mixing 100-jjlM 41mer 
template T (4.0 (jlI), 100-|jlM 20mer primer P (5.0 |jl1), 5.0 
units/iJil GO Taq hot start polymerase (0.5 |xl), 5 x GO Taq 
hot start polymerase reaction buffer (4.0 |xl) and 25-mM 
MgCl2 (1.6 |xl) with 50-mM 5hmdCTP (0.4 |xl), 50-mM 
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dGTP (0.4 |jl1), 100-mM TTP (0.8 |jl1) and nuclease-free 
water (3.3 |xl). The total volume was about 20 ixl. No dCTP 
and dATP (deoxyadenosine triphosphate) were used in the 
extension reaction. The lack of dATP ensured the exten- 
sion of the primer P to stop exactly after six nucleotides. 
The reaction was performed using a MyCycler Thermal 
Cycler (Bio-Rad Laboratories, Hercules, CA, USA). The 
extension program was set as: initial heating at 94° C for 5 
min, degeneration at 94°C for 1 min, annealing at 53°C for 
30 s, extension at 72°C for 90 min and final chilling at 4°C 
for 5 min. 

Two control versions of oligo 1 (5C-oHgo 1 and 5mC- 
oligo 1) were synthesized following the same procedure ex- 
cept that ShmdCTP was replaced with dCTP and 5mdCTP, 
respectively. 

The three synthesized oligo Is were purified using 16% 
denaturing polyacrylamide gel electrophoresis (PAGE) at 
200 V for 80 min. The denaturing gel was prepared with 
7.0-M urea. 1 x TBE buffer [90-mM Tris-borate, 2-mM 
ethylenediaminetetraacetic acid (EDTA), pH 8.3] was used 
for PAGE separation. The bands (stained by ethidium bro- 
mide) were visualized using a GBOX/HR-E-M gel docu- 
mentation system (Syngene, Cambridge, UK) and cut from 
the gel for recovery. The recovered gel slices were crushed in 
collection vials and soaked in 2.0 volume of polyacrylamide 
gel elution buffer (0.5-mol/l ammonium acetate, 10-mmol/l 
magnesium acetate, 1.0-mmol/l EDTA, pH 8.0), followed 
by shaking in a rotary shaker in the dark overnight at 37°C. 
The supernatants were aspirated out and transferred into 
other vials, followed by the addition of 2.0 volume of ice- 
cold ethanol and 0.3 volume of 100-mM MgCb. The solu- 
tions were placed at — 20°C for 1.5 h (to facilitate the precip- 
itation of the oligodeoxynucleotides) and then centrifuged 
by 12 000 rpm x 30 min at 4°C. The pellets were collected 
and washed once or twice with 70% ethanol. The collected 
pellets were air-dried, re-suspended in ddHiO and quanti- 
fied using a Nano Drop 2000 (Thermo Fisher Scientific Inc., 
Waltham, MA, USA) at 260 nm. 

Design, synthesis and purification of X-dslOOmers 

The DNA probes containing a single modified site (C, 5mC 
or 5hmC) were designed and synthesized (Supplementary 
Figure SI). Briefly, oligo 1 (with one modified site of C, 5mC 
or 5hmC), oligo 3, oligo 4 and oligo 5 were phosphorylated 
at the 5' end and then annealed with unphosphorylated oli- 
gos 2 and 6. The annealed oligos were ligated by T4 ligase to 
form an intact X-dslOOmer. In this design, the oligo 5 was 
used as the complementary strand to assist with the ligation 
of oligo 1, oHgo 2 and oligo 3 into one strand. 

Firstly, oligo 1, oligo 3, oHgo 4 and oligo 5 (150 pmol 
each) were phosphorylated by 10 units T4 PNK at 37°C 
for 2 h. The solution was buffer with 1 x ligation buffer 
(New England Biolabs, Ipswich, MA, USA). The total vol- 
ume was about 35 jxl. Next, the T4 PNK was denatured by 
heating at 70°C for 10 min. Then, two additional oligos (oli- 
gos 2 and 6; 150 pmol each) and 1-mM ATP were added 
to the solution, and the solution was heated at 70°C for 10 
min, followed by a cooling at room temperature to anneal 
the six oligos. Lastly, ligation was initiated by adding 8.5 
Weiss units of T4 DNA ligase and was carried out overnight 



at 16°C. The control probe was synthesized according to 
the same procedure, but 5hmC-oligo 1 was replaced with 
the oligo 1 containing a C or 5mC at the same position of 
5hmC. The ligation products were purified using 16% native 
PAGE. The bands were visualized by the fluorescence stain- 
ing and cut from the gel for recovery. The other procedures 
were the same as that for the purification of oligo 1 . 

Glucosylation of 5hmC-dsDNA probes or genomic DNA 

All the glucosylation reactions were performed using (3- 
GT provided with the 5hmC and 5mC Analysis Kit (NEB 
EpiMark™). 

The 5hmC-double-stranded DNA (dsDNA) probes (2.5 
jjLg each) were glucosylated using 30 units of (3-GT for 18 
h at 37° C in a reaction solution with a total volume of 40 
jjlI. This solution contained 4.0 |jul of 10 x NEBuffer 4, 1.6 
|jl1 of 80-|jlM UDP-glucose (uridine-5'-diphosphoglucose), 
and supplemented with 24.4-|xl nuclease-free water. After 
the glucosylation, the enzyme |3-GT was digested at 40°C 
for 30 min by adding 1.0-|jl1 proteinase K (20 mg/ml) to the 
solution. Then, the added proteinase K was inactivated by 
heating at 95°C for 10 min. The glucosylated DNA was then 
mixed with 2 volume of ice-cold ethanol and 0.3 volume 
of 100-mM MgCl2, and placed at -20°C for 1.5 h. Then, 
DNA was precipitated by centrifugation of 12 000 rpm x 
10 min at 4°C. The supernatant was casted out, meanwhile 
the DN A pellet was washed once or twice with 70% ethanol. 
The DNA pellets were collected, air-dried and re-suspended 
in a phosphate buffer (100-mM Na2HP04 buffer, pH 8.5, 
plus 50-mM NaCl). The final DNA solutions were quanti- 
fied using a Nano Drop 2000 at 260 nm. 

The genomic DNA (5.0 |jLg) was glucosylated using 40 
units of |3-GT for 18 h at 37°C in a reaction solution with 
a total volume of 100 |jul. The solution comprised 10.0 |jl1 
of 10 X NEBuffer 4, 4.0 |xl of 80-|xM UDP-glucose, and 
supplemented with 86.0-|xl nuclease-free water. The other 
steps were same as described above. 

The characterization of the SghmC-dsDNA probe 

The DNA products were digested into mononucleosides 
and subjected to ultra-performance liquid chromatograph- 
tandem mass spectrometry (UHPLC-MS/MS) analysis as 
described previously (14,39). Briefly, the UHPLC-MS/MS 
analysis was performed on Agilent 1290 UHPLC system 
coupled with a G6410B triple quadrupole mass spectrom- 
eter with an electrospray ionization source (Agilent Tech- 
nologies, Santa Clara, CA, USA). A reversed-phase Zor- 
bax Eclipse Plus C18 column (100 mm x 2.1 mm i.d., 1.8- 
\jum particle size, Agilent Technologies) was used for separa- 
tion. A MassHunter workstation software version B.01.03 
was used for data acquisition. The mobile phase consisted 
of 5.0% methanol and 95% water (plus 0.1% formic Acid) 
and was used for UHPLC separation of the mononucleo- 
sides at a flow rate of 0.3 ml/min. 

BA-mediated PGR assay 

To develop BA-mediated quantitative PCR (qPCR) analy- 
sis, we tested BA and four BA derivatives using four dsDNA 
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probes (c-dslOOmer, 5mC-dsl00mer, 5hmC-dsl00mer and 
SghmC-dslOOmer). 

Firstly, an aqueous solution of double-stranded 
oligodeoxynucleotides or their mixture (1.0 |jl1, 0-125 
nM) was mixed with the stock solution of BA or BA 
derivaties (3.1 |xl, 40 mM), 2 x GoTaq(R) SYBR Green 
qPCR Master Mix (12.5 |xl; Promega, Madison, WI, 
USA), 10-jjlM forward and reverse primers (0.5 |jl1 each), 
200-mM Na2HP04 buffer [pH 8.5 (5.0 |jl1)] and 400-mM 
NaCl (3.0 |jl1). As a control without any BA, 1 x BA 
stock buffer of the same volume (3.1 |jl1) was added to the 
solution instead. The solutions were supplemented with 
nuclease -free water to the total volume of 25.0 |xl. The 
qPCR was performed using a Mx3005P Real-Time qPCR 
system (Stratagene, La Jolla). The cycling conditions were 
set as: initial denaturation at 95°C for 2.0 min and then 
40 cycles of PGR including denaturation at 95°C for 15 s, 
anneahng 57°C for 15 s and extension 25°C for 60 s. Each 
sample was analyzed in triplicate. The final PGR products 
were also separated by 16% native PAGE. Another was 
mixed with the phosphate buffer (1:1, volume). 

The abundance of gene-specific 5hmG in genomic DNA 
was exclusively evaluated using 2-GB-PBA-mediated qPGR 
analysis. The glucosylated or un-glucosylated dsDNA sam- 
ples were prepared for qPGR analysis following the same 
protocol as described above, except that the total amount 
of genomic DNA was about 10 ng. 

Note: the stock solutions of BA or BA derivatives (40 
mM) were prepared in prior by dissolving BA or BA deriva- 
tives in 200-mM Na2HP04 buffer, pH 8.5 and 10% (v/v) 
methanol solution (1 x BA stock buffer). The dissolution 
of BA or BA derivatives was assisted transiently with ultra- 
sonication. 



Cell culture and DNA extraction 

The cells were cultured and genomic DNA was extracted 
as described previously (14,15). Mouse embryonic stem 
(ES) cells of wild type (WT; 129 SvEv) and of Tetl/Tet2 
double knockout (Tetl/Tet2~''~; a gift from Dr Guohang 
Xu at the State Key Laboratory of Molecular Biology, 
Shanghai, Ghina) were maintained in Dulbecco's modi- 
fied Eagle's medium (HyGlone, Thermo Fisher Scientific 
Inc., MA, USA) supplemented with 20% ES FBS (Gibco, 
Life Technologies Corporation, Grand Island, NY, USA), 
0.1 -mM non-essential amino acids, 2-mM L-glutamine, 
0.1-mM (3-mercaptoethanol, 1000 units/ml leukemia in- 
hibitory factor (Millipore, Billerica, MA, USA), 1.0-|jlM 
PD 0325901 (Stemgent, Cambridge, MA, USA) and 3.0- 
|jlM CHIR 99021 (Stemgent, Cambridge, MA, USA). Cells 
were trypsinized and plated in culture dishes pre-treated 
with 0.1% gelatin, then incubated in a humidified 37°C in- 
cubator supplied with 5.0% CO2. After 24-h treatment, the 
cells were collected for DNA extraction. Genomic DNA (20 
|jLg) was extracted from the harvested mouse ES cells by a 
Genomic DNA Purification Kit (Promega, Madison, Wl, 
USA) following the manufacturer's instructions. The con- 
centration and quality of the extracted genomic DNA were 
evaluated by measuring the absorbance at 260 nm and 280 
nm. To assure the quahty of the extracted genomic DNA, 



the ratio of absorbance at 260 to that at 280 nm is required 
to be about 1.80. 

MspI-qPCR assay 

We performed this assay following the instruction of man- 
ufacturer (https://www.neb.com/nebecommytech_reference/ 
epigenetics/epimark.asp). Briefly, 5ghmC-dsl00mer or glu- 
cosylated genomic DNA (100 ng) was incubated with 10 
units Mspl at 37°C. After 12-h enzymatic digestion, 1.0-jjl1 
proteinase K was added to each tube and incubated at 40° C 
for 30 min. Then, proteinase K was inactivated by heating at 
95°C for 10 min. The digested products were purified by ice- 
cold ethanol precipitation, and then re-suspended, quanti- 
fied and analyzed by qPCR. An aliquot without undergo- 
ing Mspl digestion was used as a control. We would obtain 
a value of ACt by subtracting the Q values from Mspl- 
digested aliquot from that of the control. 

RESULTS 

Design and synthesis of ShmC-containing oligonucleotide 
probes 

We first designed and synthesized four X-dslOOmer probes 
(X= C, 5mC, 5hmC or 5ghmC; Supplementary Figures SI 
and S2 and Supplementary Table SI). These probes com- 
prised a double-stranded oligonucleotide of 100 bp con- 
taining a C, 5mC, 5hmC or 5ghmC located at the 58th nu- 
cleotide from the 5' end (Figure la). The 5mC or 5hmC 
was incorporated precisely into the 58th position of the 
probes using well-designed primer extension reaction (Sup- 
plementary Figure SI), and no modified C was incorpo- 
rated at any other positions. Therefore, both 5mC- and 
5hmC-dsl00mers only contained a single 5mC and 5hmC, 
respectively. The C-dslOOmer did not contain any 5mC or 
5hmC and could be obtained simply by commercial syn- 
thesis. UHPLC-MS/MS analysis validated the presence of 
5mC and 5hmC in corresponding dslOOmers (data not 
shown). The 5ghmC-dsl00mer was obtained by glucosyla- 
tion of the 5hmC-dsl00mer using T4 phage (3-GT. 

Inhibition of BA on replication action of DNA polymerases 

We next tested the effect of BA on replication action of 
DNA polymerases on 5ghmC-dsl00mer using Taq DNA 
polymerase. qPCR was used to evaluate the replication ac- 
tivity. The cycle threshold (Ct) is the cycle number with a flu- 
orescence signal equal to the defined threshold (40). We ob- 
served that BA increased the Q values of 5ghmC-dsl00mer 
from the BA-absent level (18.0) to 20.3 (Figure lb). Because 
Ct levels are inversely proportional to the amount of ampli- 
fiable DNA template in the sample, the observed increase 
in Ct values suggests that the BA could reduce the initial 
amount of template glucosylated 5hmC to be amplified by 
Taq DNA polymerase. This result indicates that the BA 
could partially block the replication of glucosylated 5hmC 
in the dslOOmer. In contrast, 5ghmC-dsl00mer displayed 
the same qPCR amphfication curve as 5hmC-dsl00mer 
(unglucosylated), 5mC-dsl00mer and C-dslOOmer in the 
absence of BA (Figure Ic), suggesting that the glucosyla- 
tion of 5hmC in DNA itself cannot affect the replication 
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Figure 1. Boronic acid (BA) specifically inhibit the amplification activ- 
ity of Taq DNA polymerase on 5ghmC-containing dsDNA. (a) The se- 
quences of oligonucleotides containing modified cytosines, where X indi- 
cates C, 5mC 5hmC or 5ghmC, and the underlined sequences correspond 
to the forward and reverse primers that were used for PCR amplification, 
(b) The qPCR curves of 5ghmC-dsl00mer in the presence (dashed line) or 
absence (solid line) of BA. (c) The qPCR curves of dslOOmers containing 
a C (black solid line), 5mC (black dashed line), 5hmC (black dense dashed 
line) or 5ghmC (gray dashed line) at the 58th nucleotide in the absence of 
any BAs; gray solid lines indicate the PCR products of NTC (No Template 
Control). The solid straight lines indicate the defined threshold of qPCR. 
(d) The qPCR curves of C-dslOOmer, 5mC-dsl00mer and unglucosylated 
5hmC-ds 1 OOmer in the presence of BA (followed by the gray solid line, gray 
dashed line and gray dense dashed line) or absence of BA (followed by the 
black solid line, black dashed line and black dense dashed line). 



activity of Taq DNA polymerase and further supporting 
a critical role for BA to inhibit the replication activity of 
Taq DNA polymerase on 5ghmC-containing DNA. More- 
over, BA cannot change the qPCR amplification curves 
of 5mC-dsl OOmer, C-dsl OOmer and unglucosylated 5hmC- 
dslOOmer (Figure Id), showing high specificity of BA block- 
ing the replication of Taq DNA polymerase of the gluco- 
sylated 5hmC. This is the first report on BA blocking the 
replication activity of DNA polymerase by interacting with 
modified bases. 

On the basis of above finding, we proposed a novel as- 
say for rapid and sensitive detection of gene- or fragment- 
specific 5hmC, named as BA-mediated PCR assay. This 
assay does not require any restriction endonucleases and 
should be applicable to the detection of any sequences in 
genomic DN A. 

Testing of BA derivatives for developing SghmC-sensitive 
qPCR assay 

As described above, BA can inhibit the amplification of 
5ghmC-containing DNA by Taq DNA polymerase. How- 
ever, the inhibition is limited as shown by qPCR analysis 
(ACt = 2.3). To improve BA-mediated PCR assay, we fur- 
ther tested four BA derivatives, including PBAs, 3-CPBA, 
2-CB-PBA and 3-D-PBA. Remarkably, we observed that 
selected BA derivatives (PBA, 3-CPBA, 2-CB-PBA and 3- 
D-PBA) could increase the Ct values of 5ghmC-dsl OOmer 
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Figure 2. The effects of four BA derivatives on Taq DNA polymerase 
replicating 5ghmC-dsl OOmer. (a) qPCR curves in the presence (dashed 
line) or absence (solid line) of BA derivatives, (b) ACt values for C- 
, 5mC-, 5hmC- and 5ghmC-containing DNA in the presence and ab- 
sence of BA derivations. ACt is the difference between the Ct value 
with BA derivatives and without BA derivatives. Four BA derivatives 
are shown 2-CB-PBA (2-(2'-chlorobenzyloxy) phenylboronic acid), 3- 
CPBA (3-chlorophenylboronic acid), PBA (phenyl-boronic acid) and 3- 
D-PBA (3-(Dansylamino) phenylboronic acid) from top to bottom. Error 
bars represent the standard deviation from the mean of three independent 
experiments. 



from the BA-absent level (18.0) to 21.9-24.0 (AQ = 2.9- 
6.0), displaying higher blocking efficiency than BA. 

Similarly, among the four synthesized probes (i.e. the C- 
, 5mC-, 5hmC- and 5ghmC-dsl00mers), only the 5ghmC- 
dsl OOmer can be specifically discriminated by qPCR in the 
presence of one of the four BA derivatives (Figure 2b). 

As shown in Figure 2a and b, 2-CB-PBA displays the 
highest inhibition efficiency (AQ = 6.0) among the five 
tested BAs. Moreover, non-specific blocking by other three 
probes (i.e. the C-, 5mC- and 5hmC-dsl00mers) was not 
observed for 2-CB-PBA. In the following work, only 2-CB- 
PBA was used in BA-mediated qPCR assay to detect 5hmC- 
containing DNA. 

It is not known whether the relative placement of 5hmC 
in DNA affects 2-CB-PBA-mediated qPCR efficiency To 
test this possible effect, we synthesized three additional 
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5ghmC-ds83mer probes by placing a single 5ghmC at the 
29tli nucleotide, 32th nucleotide and 64th nucleotide from 
5' end (Figure 3a), respectively. It showed that all three 
ds83mers with the different placement of 5ghmC generated 
identical AQ values (AQ = 5.7; Figure 3b) in response 
to the 2-CB-PBA. Therefore, the relative placements of the 
5ghmC in DNA have no effect on the replication activity of 
Taq DNA polymerase, suggesting that the efficiency of 2- 
CB-PBA-mediated qPCR is mainly dependent on the pres- 
ence of 5ghmC in DNA. 

The next question we asked is whether the inhibition 
of Taq DNA polymerase by 2-CB-PBA remains to be 
specific on 5ghmC-containing DNA when co-existed with 
non-5ghmC DNA. To this purpose, we mixed the 5ghmC- 
dslOOmer probe with a dsl60mer without any modified site 
(Supplementary Table SI) at the molar ratio of 1:1. The 
total concentration of dsDNA was kept about 10.0 nM. 
As shown by melting curve analysis, the 5ghmC-dsl00mer 
and the dsl60mer have very distinct melting temperature of 
84.4 C and 86.8 C, respectively. In the presence of 2-CB- 
PBA, only the amplification of 5ghmC-dsl00mer probe was 
partially inhibited (AQ = 6.0, = 84.4° C; Supplemen- 
tary Figure S3a). In contrary, non-5ghmC-dsl60mer can be 
amplified without any inhibition (ACt =0.1, 7m = 86.8 C; 
Supplementary Figure S3b). 

We further mixed the 5ghmC-dsl00mer with genomic 
DNA from mouse Tetl/Tet2-knockout ES cells. The double 
knockout of Tetl and Tet2 in mouse ES cells indeed elim- 
inates the 5hmC in genomic DNA (15). Similarly, the am- 
plification of 5ghmC-dsl00mer was partially inhibited by 
2-CB-PBA (Supplementary Figure S3c). This suggests the 
presence of genomic DNA would not interfere with the par- 
tial inhibition on the amplification of 5ghmC-dsl00mer be- 
ing caused by 2-CB-PBA. 

In some regions of genomic DNA where 5hmCs are 
clustered, thereby it is helpful to test oligonucleotides 
containing multiple 5hmCs. For this purpose, we tested 
six ds83mers containing multiple hemi-hydroxymethylated 
CpG sites or symmetrically hydroxymethylated CpG sites 
(Figure 4a and c). All these ds83mers were glucosylated be- 
fore PCR analysis. For hemi-hydroxymethylated ds83mers, 
the ACt value increases from 6.0 to 6.99 with increasing the 
5hmC number (1-3) (Figure 4b). For symmetrically hydrox- 
ymethylated ds83mers, the AC, value increases from 6.5 to 
9.0 with increasing the 5hmC pair number (1-3) (Figure 4d). 
The results clearly show that the blocking efficiency is pos- 
itively correlated with 5hmC amounts within the tested re- 
gions. 

Quantitative evaluation of ShmC 

To demonstrate that our BA-mediated PCR assay had 
the possibility of quantifying 5hmC levels, we diluted the 
5ghmC-dsl00mer with 5hmC-dsl00mer at molar ratios 
of 0:1, 1:16, 1:8, 1:4, 1:2, 1:1, 2:1, 4:1 and 1:0 (5ghmC- 
dsl00mer:5hmC-dsl00mer). In this case, the 5ghmC- 
dslOOmer probe was obtained from the complete glucosyla- 
tion of 5hmC-dsl00mer by T4 phage P-GT (Supplementary 
Figure S4). The total concentration of the two dslOOmers 
was kept at 5.0 nM. As shown in Figure 5a and b, the ACt 
value increased (from 0 to 6.0) with the increasing content 



of 5ghmC in the dslOOmers. A linear relationship was ob- 
served between the ACt values and log [5ghmC/dsl00mer] 
(ACt = 6.14 + 4.87 log [5ghmC/dsl00mer], = 0.97; Fig- 
ure 5b). Interestingly, the ghmC-containing oligo could be 
sensitively detected even when mixing with 16-fold more 
unglucosylated 5hmC-containing oligo (Figure 5a). Native 
gel electrophoresis analysis of the final qPCR products was 
not as sensitive as qPCR, but clearly indicated that 2-CB- 
PBA reduced the product yield in the presence of 5ghmC 
(Supplementary Figure S5). 

As described above, our 2-CB-PBA-mediated PCR as- 
say can distinguish the 5ghmC-dsl00mer in the pres- 
ence of 16-fold more unglucosylated 5hmC-dsl00mer (Fig- 
ure 5). In contrast, the restriction enzyme Mspl-based 
approach failed to detect glucosylated 5hmC-dsl00mer 
(ACt = 0; Figure 5c), even without any dilution by 
the other dslOOmers. This is reasonable. The recog- 
nition of 5hmC is solely dependent on Mspl, which 
can cleave both 5mC and 5hmC but not glucosy- 
lated 5hmC. Given that 5hmC cleavage occurs only in 
the context of CCGG sequences (https://www.neb.com/ 
nebecomm/techjeference/epigenetics/epimark.asp), gluco- 
sylated 5hmC-dsl00mers without any 5hmC in a CCGG 
sequence cannot be detected by the Mspl-based approach. 
Noteworthy, early work showed that a duplex containing 
either 5hmC or 5mC at the inner position of CCGG is 
cleaved by Mspl, however, if the 5hmC/5mC is placed at 
the outer cytosine position, the modified strand is protected 
from Mspl whereas the unmodified strand is cleaved (18). 
Therefore, it is also possible that the sites containing 5mC 
and/or unglucosylated 5hmC cannot be cleaved by Mspl. 

Detection of gene-specific ShmC in genomic DNA by 2-CB- 
PBA-mediated qPCR assay 

We then asked whether BA-mediated PCR assay could be 
applied to the 5hmC detection of some specific genes or 
regions in genomic DNA. To this end, we tested three 
intron regions from a B cell transcription factor gene 
Pax5 (marked as intron_Pax5_7, intron^PaxJ^ and in- 
troii-PaxSJ) in mouse ES cell genomic DNA. As measured 
by 5hmC-deep sequencing (15, GSE43262) in our recent 
work, these regions display differential 5hmC abundance 
(evaluated by peak values: 246, 798 and 1534) in mouse ES 
cell genomic DNA. We also chose a region from UTR-5 
of Sir (marked as UTR_Srr) that displays negligible 5hmC 
(peak value: 13) as an inner control. In this experiment, 
the optimized approach (2-CB-PBA-mediated qPCR assay) 
with higher sensitivity was used to test the 5hmC abundance 
in the chosen regions of genomic DNA of mouse ES cells. 
Our assay of the glucosylated genomic DNA from mouse 
ES cells showed an increased Ct value for all three intron 
regions of Pax5 (AQ = 1.5, 3.6 and 7.5), indicating the 
presence of 5hmC (Figure 6a). In contrast, no significant 
change in the Ct value (ACt = 0.1) was observed for the 
chosen region of UTR-5 of Srr. As measured by our ap- 
proach, the three intron regions of Pax5 in glucosylated ge- 
nomic DNA display differential 5hmC abundance in 5hmC. 
The obtained ACt values are well correlated with that of 
chemical labeling sequencing analysis (R~ — 0.99; Figure 
6b). The amplification curve and melting curve suggest that 



Page 7 of 10 



Nucleic Acids Research, 2014, Vol. 42, No. 9 eSl 



(a) 



5'- T T TCC TACC TTAAGATC CT T C 
3'-AA AGGATGG AATTC TACGAAG 
® @ 

CA GT CTCC GC C GGCCAGTGTT 
GT CAGAGG CG G CCGGTCACAA 

ACC C TTAGAGC TCATACCATT 
TGG GAA TCT CG AGTATGGTAA 



:GC CA ATT TCT T CGCACGTT-3' 
CG GT TAAAGAAGCGTGCAA-5' 



(b) 



7.0 



3.5 



< 



0.0 



® 



® 



5ghmC site 



Figure 3. The effect of relative placement of the 5ghmC site for the replication activity of Taq DNA polymerase, (a) The sequences of the 5hmC-ds 83mer 
probes used for 2-CB-PBA-mediated PCR assay. We synthesized three 5ghmC-ds83mer probes by placing a single 5ghmC at the 29th nucleotide (?), 32th 
nucleotide (?) or 64th nucleotide (?) counted from 5' end (Figure 3a). The underlined sequences correspond to the forward and reverse primers that were 
used for PCR amplification, (b) AQ values for the different placements of the 5ghmC-ds 83mer DNA probes in the presence and absence of 2-CB-PBA. 
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Figure 4. The effect of placement of the multiple 5hmC sites for the replication activity of Taq DNA polymerase using 2-CB-PBA-mediated PCR assay, 
(a) and (c): Six glucosylated ds83mers containing multiple hemi-hydroxymethylated CGs (a) or symmetrically hydroxymethylated CGs (c) at the 29th, 32th 
and 34th nucleotides from 5' end were used as indicated, (b) and (d): ACt values for the hemi-5ghmC (b) or symmetry-5ghmC-dsDNA probes (d) in the 
presence and absence of 2-CB-PBA. Error bars represent the standard deviation from the mean of three independent experiments. X indicates 5ghmC site. 
The sequence of 5hmC-ds83mers was the same as listed in Figure 3a. The 5hmC-ds83mers were subjected to be glucosylated as described in the Materials 
and Methods section. 



the amplification of intron-/'aA'5_? is completely inhibited in 
the presence of 2-CB-PBA (Supplementary Figure S6a and 
b). The gel electrophoresis of final PCR products also con- 
firmed the predominant inhibition of the amplification of 
glucosylated intron-PaxSJ by 2-CB-PBA (Supplementary 
Figure S6c). Consistently, both our current approach and 



chemical labeling-Seq analysis show that the intron-ft/xJ-J 
region (110 bp) displays the highest 5hmC abundance. 

By mixing the genomic DNA of WT with the genomic 
DNA from the mutant cells depleting Tetl and Tet2 in vary- 
ing ratios, we observed that 5hmC-abundant PAX5J could 
be detected at the ratio of 1:64 (Supplementary Figure S7). 
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Figure 5. Quantitative evaluation of 5ghmC-dsl00mer by 2-CB-PBA-mediated PCR. (a) The AQ values of mixed dsDNA templates (5ghmC-dsl00mer 
and 5hmC-dsl00mer) with the varying molar ratios of 0:1, 1:16, 1:8, 1:4, 1:2, 1:1, 2:1, 4:1 and 1:0. (b) The linear relationship between the AQ value and the 
log [5ghmC/dsl00mer]. (AQ = 6.14 + 4.87 log [5ghmC/dsl00mer], = 0.97). (e) The comparison of 2-CB-BA-mediated PCR assay and MspI-qPCR 
analysis of 5ghmC-dsl00mer. 
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Figure 6. The 2-CB-PBA-mediated qPCR assay for fragment-specific detection of 5hmC in the indicated INTRON-/l3.\J regions of genomic DNA of 
mouse embryonic stem (ES) cells, (a) The ACt values obtained by the 2-CB-PBA-mediated qPCR assay, (b) The correlation of the 2-CB-PBA-mediated 
qPCR with Chem-Seq analysis of 5hmC. (c) The ACt values obtained by the MspI-qPCR assay. The primers designed for amplification of the target DNA 
regions were listed in Supplementary Table S2. UTR-S.S-f was included as inner control from qPCR analysis. Error bars represent the standard deviation 
from the mean of at least three experiments. 'Glu' indicates the glucosylation of DNA by P-GT. Tet-KO indicates the double knockout of Tetl and Tet2. 



The results hint that our approach is sensitive for detec- 
tion of 5hmC-abundant genes. The observed better sensitiv- 
ity for detection of 5hmC in PAX J (Supplementary Figure 
S7 versus Figure 5) is attributed to the presence of multiple 
5hmC in the region oi PAX5J. 

Moreover, the Q values of unglucosylated DNA are basi- 
cally unchanged (AQ — 0. 1-0.3; Figure 6a), suggesting that 
the selected PCR amplification regions do not contain any 
spontaneous SghmC in these cells. Therefore, the detected 
SghmC by 2-CB-PBA-mediated qPCR assay results exclu- 
sively from the (3-GT-catalyzed glucosylation of ShmC in 
the extracted genomic DNA of mouse ES cells, confirming 
the specificity of the BA-mediated PCR approach. 



By knocking out Tetl/Tet2 mediating SmC oxidation 
and ShmC formation, no ShmC was detected in all three in- 
tron regions of Pax5 (Figure 6a), confirming the reHability 
of our approach. 

To verify the depletion of Tetl and Tet2, we measured 
the expression of Tetl and Tet2 at mRNA levels in the 
Tetl/Tet2 double knockout ES cells using qPCR. Indeed, 
the levels of Tetl and Tet2 expression in Tetl/Tet2 double 
knockout ES cells decrease significantly by two or three or- 
ders of magnitude (Supplementary Figure S8) when com- 
pared with that in ES cells of WT. 

We further tested the three intron regions of Pa.x5 us- 
ing the Mspl-based restriction-glucosylation assay from ge- 
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nomic DNA. 5hmC could not be detected in all three re- 
gions (Figure 6c). These regions do not have any CCGG 
site, and their Q values for unglucosylated and glucosy- 
lated DNA are essentially unchanged (AQ < 0.7; Figure 
6c). These results suggest the limitation of the restriction en- 
donuclease approach, which requires cleavable sites. In con- 
trast, our assay shows the differential distribution of 5hmC 
in the three regions, which are consistent with 5hmC-deep 
sequencing. Therefore, our assay does not exhibit such lim- 
itation and can be used to detect 5hmC in all of the selected 
regions of genomic DNA. 

To evaluate the sensitivity of our approach and show 
the biologically relevant applications, we measured 5hmC 
distribution of 10 regions in human MRC-5 cells, which 
have naturally low expression of all three TET proteins. 
The 5hmC abundance in human MRC-5 cells is compara- 
ble with that of most of human tissues. The tested regions 
and the sequences of the designed primers were listed in 
Supplementary Table S3. By the 2-CB-PBA-mediated PCR 
method, we observed that three regions (one CDS_UBC, 
one INTR0N_G0SR2 and one INTRON_FBX032) ex- 
hibit the highest 5hmC abundance (AQ = 3.2) among 
10 tested regions (Supplementary Figure S9). Consistently, 
5hmC DIP-Seq analysis [GSE44457, (15)] showed that the 
three regions also have the highest 5hmC abundance (peak 
values: 850-1 1 58). Together, we conclude that our approach 
is sensitive for the detection of gene-specific 5hmC in bio- 
logically relevant samples. 

DISCUSSION 

Here, we demonstrate that all unglucosylated dslOOmers are 
normally amplified using Taq DNA polymerase regardless 
of whether any of BA and its derivatives is present. This 
result indicates that BA and its derivatives cannot directly 
inhibit or block the amplification activity of Taq DNA 
polymerase. Such amplification activity is partly blocked 
only when the 5hmC in template DNA is modified by one 
glucose molecule and when BA or one of its derivatives 
is present. Therefore, the observed blocking of Taq DNA 
polymerase by glucosylated 5hmC can be attributed to the 
specific interaction of BA (or its derivatives) with the gluco- 
sylated 5hmC, which most likely leads to the formation of a 
bulky covalent but dynamic complex of BA-glucose-5hmC 
(41^5). 

Here, we also demonstrate the application of the BA- 
mediated PCR assay for the detection of 5hmC in Pax5 
gene of mouse ES cell genomic DNA. Our results reveal the 
highest abundance of 5hmC in intron-PaxJ regions among 
tested 23 regions of genomic DNA of mouse ES cells (Fig- 
ure 6 and Supplementary Figure SIO). Therefore, we iden- 
tified PAX5J as a 5hmC-rich fragment. Pax5 is a tran- 
scription factor to regulate the development and function 
of mature B cell (46). Pax5 is also implicated in chromo- 
some translocation, aberrant high frequency mutation and 
aberrant methylation, which were found in hematopoietic 
malignancies (lymphoblastic leukemia, myeloid leukaemia) 
(46). The role of the abundant 5hmC in Pax5 remains un- 
clear. 

By our approach, the 23 regions of genomic DNA of 
mouse ES cells were simultaneously measured in one batch 



of tests using PCR instrument with 96 channels (Figure 6 
and Supplementary Figure SIO). This indicates that our as- 
says are potentially high throughput. 

In summary, we describe a novel DN A polymerase ampli- 
fication assay for distinguishing 5hmC from 5mC without 
the need for restriction enzymes. This was achieved by in- 
troducing BAs to block the replication of DNA polymerase 
on glucosylated 5hmC. Theoretically, the assay can be ap- 
plied to detect 5hmC in any DNA sequence, eliminating 
the sequence limitation of recently developed restriction- 
glucosylation assays. Essentially, our study open a new way 
of detecting 5hmC for chemical, biological, medical and 
pharmacological studies. 
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